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PREFACE 


This  report  documents  research  conducted  on  the  acceptability  of  performance  ratings  as 
part  of  the  Air  Force  Job  Performance  Measurement  project.  Portions  of  this  research  were 
completed  under  prime  contract  number  F41689-84-D-0001  with  Universal  Energy  Systems  for 
the  Training  Systems  Division  of  the  Air  Force  Human  Resources  Laboratory,  Brooks  Air  Force 
Base,  TX.  This  paper  was  completed  under  in-house  Work  Unit  No.  1 121-12-00.  Some  of  these 
results  were  presented  at  the  annual  meeting  of  the  Society  for  Industrial  and  Organizational 
Psychology,  Montreal,  CA,  April,  1992. 
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IV 


EXPLORING  THE  CONCEPT  OF  ACCEPTABILITY  AS  A  CRITERION 
FOR  EVALUATING  PERFORMANCE  MEASURES 


SUMMARY 

This  study  examined  raters'  reactions  to  the  use  of  several  different  rating  forms,  and  the 
notion  of  using  acceptability  as  a  criterion  for  evaluating  appraisal  systems  or  techniques.  A  total  of 
1581  self  and  peer  job  performance  ratings  were  completed  by  enlisted  Air  Force  job  incumbents,  in 
conjunction  with  ratings  by  522  supervisors  of  those  incumbents.  Questionnaires  were  administered  to 
determine  rater  perceptions  of  rating  form  acceptability  and  factors  related  to  acceptability.  Factor 
analyses  identified  a  number  of  interpretable  factors  related  to  acceptability,  motivation,  job 
satisfaction,  situational  constraints,  and  rater  trust.  Regression  analyses  indicated  that  motivation  to 
rate,  trust  in  others,  and  situational  constraints  were  predictive  of  acceptability  for  both  supervisors  and 
job  incumbents.  ANOVA  and  post-hoc  tests  indicated  differences  in  acceptability  across  rating  sources 
and  rating  forms.  Supervisors'  perceptions  were  more  favorable  than  incumbents,  and  a  task-level 
rating  form  was  significantly  less  acceptable  to  all  raters.  Results  are  discussed  in  terms  of  usefulness 
of  an  acceptability  criterion  in  applied  research. 

L  INTRODUCTION 

Research  on  the  measurement  of  job  performance  remains  a  topic  of  considerable  interest  in 
the  industrial/organizational  psychology  literature,  and  conceptual  and  methodological  advances 
continue  to  be  made  (Borman,  1991).  While  criterion  measurement  is  essential  for  almost  any 
personnel  research  application,  choosing  an  adequate  criterion  or  set  of  criteria  remains  a  relatively 
casual  process.  As  a  number  of  researchers  (e.g.,  Kavanagh,  1982;  Sulsky  &  Balzer,  1992)  have  noted, 
there  are  multiple  criteria  that  can  and  should  be  used  forjudging  the  quality  of  measurement 
instruments,  procedures,  and  systems. 

Over  the  years  researchers  have  identified  and/or  developed  many  examples  of  "criteria  for 
criteria"  as  standards  on  which  to  assess  the  quality  of  criterion  measures  (Weitz,  1961).  Bellows 
(1961)  suggested  that  criterion  measures  be  reliable,  realistic,  representative,  related  to  other  criteria, 
acceptable  to  the  job  analyst,  acceptable  to  management,  consistent  from  one  situation  to  another,  and 
predictable.  Blum  and  Naylor  (1968)  proposed  that  criterion  measures  should  also  be4nexpensive, 
understandable,  measurable,  relevant,  uncontaminated  and  bias-free,  and  discriminating.  Bemardin  and 
Beatty  (1984)  compiled  a  large  list  of  variables  and  clustered  these  variables  into  three  primary 
categories  of  criteria:  quantitative  (e.g.,  reliability,  validity,  discriminability),  utilization  (e.g.,  feedback, 
merit  pay,  adverse  impact),  and  qualitative  (e.g.,  amount  of  documentation,  user  acceptability, 
maintenance  costs). 

While  researchers  have  periodically  called  attention  to  the  availability  of  multiple  criteria  to 
judge  criteria,  operational  definitions  of  these  variables  are  not  frequently  supplied.  If  some  form 
of  empirical  evaluation  is  included  it  is  typically  dominated  by  reliability  and  validity  considerations. 
Several  researchers  have  attempted  to  apply  a  more  systematic  process  to  the  use  of  multiple  criteria  to 
judge  performance  measures. 
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McAfee  and  Green  (1977)  described  a  set  of  16  criteria  they  applied  to  aid  the  selection  of  a 
performance  appraisal  method  for  use  in  a  large  midwestem  hospital.  Ten  different  appraisal  methods 
were  rated  on  criteria  such  as  usefulness  for  counseling  and  employee  develoment,  expense  to  develop, 
reliability,  and  freedom  from  psychometric  errors.  McAfee  and  Green  then  rated  each  method  on  each 
criterion,  and  used  a  weighted  sum  to  identify  the  best  method  for  the  job  and  organization  under 
consideration. 

Drawing  on  the  work  of  McAfee  and  Green  (1977),  Kavanagh  (1980,  1982)  proposed  a  list 
of  19  criteria  against  which  to  judge  the  value  of  performance  appraisal  systems.  Each  of  these  criteria 
was  operationally  defined,  and  included  psychometric  quality,  developmental  costs,  user  acceptance, 
periodic  review/feedback,  meets  EEOC  guidelines,  and  susceptibility  to  inflation  of  ratings.  User 
acceptance,  or  acceptability  was  seen  as  critical  to  the  appraisal  system's  effect  on  employee  motivation 
and  management  control. 

In  the  remainder  of  this  paper  we  intend  to  examine  more  fully  the  concept  of  acceptability, 
and  demonstrate  how  it  may  be  used  as  a  criterion  to  evaluate  the  worth  of  a  performance  appraisal 
system  or  technique. 

Attitudes  About  Performance  Appraisal 

Recent  reviews  of  performance  appraisal  have  emphasized  a  broader  focus  on  criteria. 
Dickinson  (1993)  reviewed  the  literature  on  attitudes  about  performance  appraisal,  and  suggested  that 
if  negative  attitudes  about  performance  appraisal  prevail  among  organizational  members,  performance 
appraisal  will  be  unacceptable  to  many  members,  and  its  use  may  hinder  rather  than  help  achieve 
outcomes.  In  addition,  Dickinson's  review  supported  Lawler's  contention  that  appraisal  system 
fha^cterigtira,  the  individual,  and  the  organization  are  all  determinants  of  attitudes  about  performance 
appraisal. 

Murphy  and  Cleveland  (1991)  noted  that  the  dominance  of  psychometric  and  accuracy 
criteria  have  diverted  researchers'  attention  away  from  three  classes  of  criteria  that  might  be  critical  in 
determining  the  success  of  an  appraisal  system,  namely,  1)  reactions,  2)  practicality,  and  3)  decision 
process  criteria. 

They  argue  that  reaction  criteria  (such  as  perceptions  of  fairness  and  accuracy  of  appraisal 
systems)  probably  place  a  ceiling  on  the  possible  effectiveness  of  the  system,  since  acceptance  of  the 
system  by  raters  and  ratees  may  be  necessary  but  not  sufficient  for  the  system  to  be  effective. 
Practicality  criteria  such  as  time  commitment,  cost,  political  acceptability,  and  ease  of  installation 
are  also  cited  by  Murphy  and  Cleveland  as  useful  but  neglected  criteria.  Finally,  the  contribution  of  a 
performance  appraisal  system  to  the  decision  making  process  should  be  considered  as  well,  both  in 
terms  of  the  degree  to  which  decisions  are  accepted  by  members  of  the  organization  and  the  degree  to 
which  decisions  are  facilitated  by  the  performance  appraisal  system. 
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Organizational  Justice 

A  related  body  of  literature  that  has  received  renewed  attention  recently  is  organizational 
justice  (Thibaut  &  Walker,  1975),  particularly  with  its  translation  into  performance  appraisal  terms 
(e.g.,  Greenberg,  1986a).  Studies  on  perceptions  of  justice  and  fairness  in  organizations  are  directed  at 
identifying  the  features  of  organizational  procedures  that  affect  perceptions  of  fairness,  work  attitudes, 
and  behavior.  The  literature  suggests  there  are  two  dimensions  of  perceived  justice  to  any  policy  - 
distributive  justice  and  procedural  justice. 

Distributive  justice  refers  to  normative  standards  for  evaluating  the  fairness  of  the  allocation 
of  outcomes  between  parties  (Leventhal,  1976).  Distributive  justice  interpreted  in  performance 
appraisal  terms  focuses  on  the  fairness  of  the  evaluations  received  relative  to  the  work  performed. 
Distributive  justice  comes  into  play  when  evaluation  decisions  are  emphasized  as  a  means  to  an  end, 
for  example,  when  persons  view  performance  ratings  as  a  means  of  obtaining  promotions,  salary 
increases,  etc.  Distributive  fairness  is  reflected  in  equitable  distribution  of  reward  outcomes  across 
persons  rather  than  in  the  determination  of  performance  ratings. 

Procedural  justice  refers  to  normative  standards  for  evaluating  the  manner  in  which  a  decision 
is  reached.  Procedural  justice  related  to  performance  appraisal  focuses  on  the  fairness  of  the  evaluation 
procedures  used  to  determine  the  ratings.  In  other  words,  as  Greenberg  (1986a)  suggests,  beliefs 
about  fair  performance  evaluations  may  be  based  on  the  procedures  by  which  the  evaluations  are 
determined  apart  from  the  evaluation  received.  Thus,  procedural  fairness  comes  into  play  when 
performance  evaluations  are  considered  as  "ends  in  themselves"  (Greenberg,  1986b).  Procedural 
fairness  is  reflected  in  the  perceived  validity  of  performance  measurement  procedures  and  the 
opportunity  for  employees  to  provide  a  complete  picture  of  their  performance  to  supervisors  before  the 
evaluation.  Thus,  behaviors  and  components  of  job  performance  that  contribute  to  evaluations  are 
inputs. 

Within  this  broad  organizational  framework,  the  current  study  would  be  classified  as 
addressing  performance  appraisal  issues  related  to  procedural  justice.  Because  our  study  was  for 
research  purposes  only,  rating  outcomes  were  not  of  primary  concern,  but  rather  how  rater  attitudes 
about  appraisal  related  to  the  processes  and  procedures  of  appraisal,  as  well  as  to  other  salient 
variables. 

Performance  Appraisal  Acceptability 

In  spite  of  the  common  sense  logic  that  acceptance  of  a  personnel  procedure  is  crucial  to  its 
effective  use,  it  was  not  until  1967  that  Lawler  noted  that  attitudes  toward  performance  ratings  could 
affect  their  validity.  Lawler  (1967)  proposed  a  model  of  the  factors  that  affect  the  construct  validity  of 
ratings  Central  to  the  model  was  the  belief  that  attitudes  toward  the  equity  and  acceptability  of  a 
rating  system  are  a  function  of  organizational  and  individual  characteristics,  as  well  as  the  rating  format. 
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Landy,  Bames,  and  Murphy  (1978)  were  among  the  first  researchers  to  empirically  examine 
attitudinal  factors  as  they  relate  to  job  performance  measurement.  They  identified  four  significant 
predictors  of  perceived  fairness  and  accuracy  of  performance  appraisals:  (a)  frequency  of  appraisal,  (b) 
plans  developed  with  the  supervisor  for  eliminating  weaknesses,  (c)  supervisor's  knowledge  of  the 
ratee's  job  duties,  and  (d)  supervisor's  knowledge  of  the  ratee's  level  of  performance.  In  a  follow-up 
study  with  the  same  population,  the  level  of  the  performance  rating  did  not  affect  these  relationships 
(Landy,  Bames-Farrell,  &  Cleveland,  1980). 

Dipboye  and  de  Pontbriand  (1981)  distinguished  between  employees'  opinions  of  their 
performance  appraisal  system  and  employees'  opinions  of  the  appraisal  itself.  They  found  that  four 
factors  related  to  the  two  dependent  variables:  (a)  favorability  of  the  appraisal,  (b)  opportunity  for 
employees  to  state  their  own  perspective  in  the  appraisal  interview,  (c)  job  relevance  of  appraisal 
factors,  and  (d)  discussion  of  plans  and  objectives  with  the  supervisor. 

A  series  of  studies  by  Kavanagh  and  colleagues  extended  the  examination  of  users' 
perceptions  of  performance  appraisal  systems  (Hedge,  1983;  Kavanagh  &  Hedge,  1983;  Kavanagh, 
Hedge,  Ree,  Earles,  &  DeBiasi,  1984).  Although  users  attitudes  toward  the  appraisal  form  and  the 
broader  concept  of  the  appraisal  system  did  not  seem  to  differ  (i.e.,  virtually  identical  regression  models 
were  found),  several  attitudes  toward  the  appraisal  system  were  significant  predictors  of  appraisal 
acceptability  across  studies.  These  included  attitudes  about  whether:  (a)  the  appraisal  system 
facility  fair  and  accurate  appraisals,  (b)  the  appraisal  system  allows  raters  to  distinguish  between 
workers'  proficiencies,  (c)  the  appraisal  system  provides  clear  performance  standards,  (d)  ratees  receive 
satisfactory  feedback,  and  (e)  ratees  receive  a  satisfactory  performance  evaluation. 

While  the  study  by  Hedge  and  Kavanagh  (1983)  and  Kavanagh  et  al.  (1984)  focused 
exclusively  on  factors  related  to  acceptability,  Hedge  (1983)  used  an  acceptability  measure  (i.e.,  how 
acceptable  do  you  find  your  current  performance  appraisal  system?),  in  conjunction  with  more 
traditional  performance  appraisal  criterion  measures  to  evaluate  the  implementation  of  a  new 
performance  appraisal  system  at  a  large  hospital.  He  discovered  that  ratees  found  the  new  appraisal 
system  more  acceptable  than  the  system  previously  in  use. 

The  objective  of  the  present  research  was  to  focus  on  one  criterion,  acceptability,  that  has 
been  relatively  under-investigated,  develop  an  operational  definition  for  that  variable,  and  follow  a 
systematic  procedure  for  collecting  and  evaluating  such  data.  While  Kavanagh  (1980,4  982), 

Bemardin  and  Beatty  (1984),  and  others  have  discussed  acceptability  as  an  important  criterion,  it  has 
rarely  been  used. 

The  present  study  had  four  interrelated  purposes.  Because  of  the  scarcity  of  research  that  uses 
or  examines  the  use  of  reaction  criteria,  our  first  purpose  was  to  extend  the  development  of  a  reaction 
criterion  beyond  what  has  been  done  to  date.  Early  research  focused  on  single-item  measures  (e.g., 
Landy  et  al.,  1978;  Landy  et  al.,  1980)  of  perceived  fairness  and  accuracy.  Other  researchers  (Dobbins, 
Cardy,  &  Platz-Vieno,  1990;  Giles  &  Mossholder,  1990)  chose  satisfaction  with  appraisal  as  their 
single-item  reaction  measure,  arguing  that  a  satisfaction  criterion  assesses  both  fairness  cognitions  and 
affect,  thus  offering  a  broader  indicator  of  appraisal  reactions  (Giles  &  Mossholder,  1990).  Dipoye  and 
de  Pontbriand  (1980)  used  a  3-item  measure  of  satisfaction  and  understanding  of  the  appraisal  process, 
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and  a  4-item  measure  of  whether  the  appraisal  system  facilitates  employee  evaluation.  Kavanagh  et  al. 
(1984)  focused  on  both  the  system  and  the  form,  broadening  the  concept  to  emphasize  overall 
acceptability,  but  once  again  using  single-item  measures. 

Following  the  advice  of  Kavanagh  (1982),  Bemardin  and  Beatty  (1984),  and  Murphy  and 
Cleveland  (1991),  we  chose  to  focus  on  the  construct  of  acceptability  as  the  measure  that  would  best 
capture  reactions  to  appraisal.  Building  on  previous  research  findings  on  reactions  to  appraisal,  items 
were  written  to  reflect  broadly  the  concept  of  acceptability,  including:  a)  facilitates  identification  of 
performance  differences  between  employees,  b)  fatilitatescapturing  the  true  picture  of  job 
performance,  c)  overall  acceptability  of  the  form,  d)  ease  of  form  use  and  understanding,  e)  facilitates 
confidence  in  ratings,  and  f)  facilitates  fair  evaluation  of  performers. 

A  second  purpose  of  the  study  was  to  examine  the  relationship  between  perceptions  of 
appraisal  acceptability  and  variables  both  internal  and  external  to  the  appraisal  process.  Because  of  the 
validation  research  purpose  of  our  study,  variables  prevalent  in  previous  appraisal  reaction  studies  (e.g., 
setting  performance  objectives,  devising  action  plans,  counseling  employees,  discussing  salary  issues) 
were  irrelevant  and  thus  not  included.  We  did,  however,  identify  two  appraisal  process  factors  from 
the  literature  that  seemed  relevant  to  our  study,  rater  trust  and  rater  motivatioa 

Rater  motivation  has  been  largely  ignored  by  performance  appraisal  researchers.  Although 
DeCotiis  and  Petit  (1978)  incorporated  rater  motivation  as  an  important  part  of  their  model  of  the 
appraisal  process,  they  cited  only  Taft's  (1971)  theory  of  interpersonal  judgments  as  support  for  the 
inclusion  of  this  variable  in  their  model.  Recently,  Bemardin  and  his  colleagues  (Bemardin  &  Cardy, 
1982;  Bemardin,  Orban,  &  Carlyle,  1981)  focused  on  rater  motivation,  but  only  in  terms  of  how  it 
might  be  affected  by  the  level  of  trust  a  rater  has  in  the  appraisal  system.  Bemardin,  Orban,  &  Carlyle 
(1981)  developed  a  measure  they  labeled  "trust  in  the  appraisal  process,"  and  found  that  both  trust  and 
motivation  were  linked  to  the  perceptions  of  fairness  and  accuracy  of  appraisal.  Consequently,  for  the 
current  study,  items  were  written  to  tap  facets  of  these  factors  including:  a)  general  motivation  to  rate, 
b)  motivation  to  rate  accurately,  c)  rater  trust  in  the  appraisal  process,  d)  trust  in  other  raters,  and  e) 
trust  in  researchers. 

Past  research  within  the  performance  appraisal  domain  has  also  identified  variables  that  appear 
to  have  relevance  for  our  study.  We  focused  on  three  particular  variables  that  could  impact  ratings. 
Peters  and  O'Connor  (1980)  hypothesized  that  constraints  on  performance  may  lead  to  lower 
effectiveness  levels,  and  some  support  has  been  found  for  such  a  notion  (e.g.,  O'Connor,  Peters, 

Rudolf  &  Pooyan,  1984;  Olson  &  Borman,  1989).  Extending  this  logic  for  raters,  it  was  hypothesized 
that  situational  constraints  may  affect  raters'  ability  to  rate  accurately,  thereby  affecting  perception  of 
appraisal  acceptability,  and  items  were  written  to  tap  constraints  related  to  tool  availability  and  job 
manual  availability  and  clarity. 

Two  other  sets  of  items  were  also  developed  to  examine  the  influence  of  other  external 
variables  on  performance  appraisal  acceptability.  The  two  factors  that  appeared  to  have  some 
relevance,  and  had  been  used  in  past  research  studies  were  supervisory  support  and  job  satisfaction. 
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Dickinson  (1993)  noted  that  perhaps  the  single  most  important  determinant  of  employee 
attitudes  about  performance  appraisal  is  the  supervisor.  He  suggested  that  when  the  supervisor  is  seen 
as  trustworthy  and  supportive,  then  attitudes  about  performance  appraisal  are  favorable.  In  addition, 
Olson  and  Borman  (1989)  found  relationships  between  supervisory  support  and  job  performance, 
snggpgfing  that  supervisory  support  could  be  related  to  attitudes  about  performance  appraisal. 

Similarly,  while  only  modest  relationships  have  been  found  between  job  satisfaction  and  job 
performance  (e.g.,  Iaffaldano  &  Muchinsky,  1985;  Podsakoff  &  Williams,  1986),  we  felt  it  would  be 
useful  to  explore  whether  attitudes  about  the  job  would  be  related  to  attitudes  about  appraisal  system 
acceptability.  For  example,  Giles  and  Mossholder  (1990)  found  modest  relationships  between  job 
satisfaction  and  satisfaction  with  the  appraisal  system. 

A  third  purpose  of  the  present  study  was  to  examine  the  link  between  rating  source  and 
performance  appraisal  acceptability.  While  previous  studies  of  performance  appraisal  attitudes  have 
almost  exclusively  focused  on  ratee  reactions,  the  focus  of  our  study  was  on  the  reactions  of  the  raters 
to  the  forms  they  had  been  asked  to  use.  In  addition,  because  both  job  incumbents  and  supervisors 
were  asked  to  provide  performance  ratings  and  responses  to  other  attitudinal  questions,  we  were  able 
to  examine  whether  the  variables  associated  with  appraisal  acceptability  differed  by  rating  source,  and 
whether  levels  of  appraisal  acceptability  differed  by  rating  source. 

A  fourth  purpose  of  our  research  was  to  examine  whether  rater  acceptability  differed  across 
rating  forms.  As  noted  earlier,  McAfee  and  Green  (1977)  evaluated  10  appraisal  methods  against  a  list 

of  16  criteria  as  a  way  to  select  an  appraisal  method  for  nurses  in  a  hospital.  Relying  on  their  own 

knowledge  of  the  different  methods,  they  rated  the  effectiveness  of  the  methods  on  the  16  criteria  and 
arrived  at  a  final  decision  about  which  type  of  appraisal  method  to  use.  Kavanagh  (1982)  also 
recommended  that  such  a  procedure  be  used,  but  to  our  knowledge,  no  published  study  has  gathered 
attitudinal  information  (from  the  individuals  who  would  be  asked  to  use  the  forms)  as  one  component 
of  the  measurement  method  selection  process.  Thus,  a  final  aim  of  our  research  study  was  to  collect 
data  on  rater  attitudes  about  the  acceptability  of  four  separate  performance  appraisal  forms  that  had 
been  developed  for  posable  use  in  a  validation  project. 

In  summary,  there  is  little  empirical  research  concerning  perceptions  of  appraisal  system 
acceptability.  The  purpose  of  the  present  research  was  to  identify  and  clarify  the  construct  of  appraisal 
acceptability,  and  examine  factors  related  to  this  acceptability  construct.  We  also  wanted  to  examine 
whether  attitudes  about  acceptability  differ  across  rating  sources  and  rating  forms. 


EL  METHOD 


Background 

Betweeen  1984  and  1989  the  Air  Force  Human  Resources  Laboratory1  conducted  a 
large-scale  research  project  to  develop  a  variety  of  performance  measures  for  use  in  the  validation  of 

1  This  is  now  the  United  States  Air  Force  Armstrong  Laboratory,  Human  Resources  Directorate. 
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selection  and  classification  tests  and  evaluation  of  training  programs  (Hedge  &  Teachout,  1986; 
Teachout  &  Pellum,  1990).  As  part  of  this  project,  a  variety  of  different  rating  forms  were  developed 
to  evaluate  the  job  performance  of  enlisted  personnel  in  their  first  four  years  of  military  service. 

Participants 

Personnel  from  seven  Air  Force  specialties  participated  in  this  research.  These  specialties 
included  Air  Traffic  Control  Operator,  Aircrew  Life  Support  Specialist,  Information  Systems  Radio 
Operator,  Aerospace  Ground  Equipment  Mechanic,  Personnel  Specialist,  Precision  Measurement 
Equipment  Laboratory  Specialist,  and  Avionic  Communications  Specialist.  A  total  of  1581  job 
incumbents  (ratees  and  peers2)  and  522  supervisors  completed  sel£  peer,  or  supervisor  ratings  (5530 
ratings  were  completed),  as  well  as  Background  and  Rating  Form  Questionnaires.  Job  incumbents 
averaged  27.5  months  of  Total  Active  Federal  Military  Service;  79.0%  were  male,  and  75.4%  were 
Caucasian. 

Questionnaires 

Two  questionnaires  were  developed  to  gather  information  from  job  incumbents  and 
supervisors  both  before  they  made  ratings  (using  a  Background  Questionnaire),  and  after  they  made 
ratings  (using  a  Rating  Form  Questionnaire).  The  Background  Questionnaire  included  10  items 
hypothesized  to  measure  three  different  constructs.  Three  items  measured  situational  constraints  (e.g., 
"The  technical  manuals  and  other  written  materials  that  I  use  in  my  job  are  available  when  I  need 
them”).  Five  items  measured  job  satisfaction  (e.g.,  "I  get  a  sense  of  accomplishment  from  my  job."). 
Two  items  measured  supervisory  support  (e.g.,  "I  feel  that  my  supervisor  gives  me  the  support  I  need 
to  do  my  job.”). 

The  Rating  Form  Questionnaire  contained  20  items  hypothesized  to  measure  three  constructs. 

Seven  items  measured  the  rater's  motivation  to  rate  (e.g.,  "How  motivated  were  you  to  complete  the 
rating  forms?";  "Did  you  make  an  'extra  effort'  to  carefully  pay  attention  to  all  of  the  instructions  and 
examples  in  order  to  make  accurate  ratings?").  Seven  items  measured  rater  trust  in  the  appraisal 
process,  in  other  raters,  and  in  the  researchers  conducting  the  research  (e.g.,  "Will  your  supervisor  have 
access  to  any  information  about  you  collected  from  the  rating  forms?";  "Do  you  believe  other  persons 
involved  really  tried  to  follow  the  rules  in  completing  their  ratings?";  "Do  you  believe  that  the  true 
purpose  of  the  ratings  was  the  one  explained  to  you  during  the  rater  orientation?";  several  of  these 
items  were  similar  to  those  used  by  Bemardin,  Orban,  &  Carlyle,  198 1). 

Six  items  measured  acceptability  of  the  appraisal  process.  The  six  acceptability  items  were 
designed  to  tap  perceptions  of  appraisal  form  (a)  fairness,  (b)  clarity  of  instructions,  (c)  contributions  to 
rating  accuracy,  (d)  contributions  to  discrimination  between  ratees,  (e)  overall  acceptability  to  raters, 
and  (f)  confidence  raters  had  in  their  ratings.  Raters  responded  to  the  same  six  acceptability  questions 
for  each  of  the  four  rating  forms. 

2  Job  incumbents  could  be  asked  to  provide  a  self  rating,  a  peer  rating,  or  both  types  of  ratings. 
However,  regardless  of  the  rating  requirements,  all job  incumbents  were  in  their  first  four  years  of 
military  sendee. 
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Scales  for  all  items  were  five-point,  adjectivally  anchored  graphic  rating  scales.  Across  the 
seven  specialties.  Background  and  Rating  Form  Questionnaire  items  were  identical,  with  one 
exception.  The  Air  Traffic  Control  Operator  Background  Questionnaire  omitted  one  "constraint  item" 
that  asked  about  tool  and  equipment  availability. 

Rating  Forms 

A  series  of  four  rating  forms  were  developed  to  measure  job  performance.  All  rating  forms 
were  constructed  using  a  5-point,  adjectivally  anchored  rating  scale.  In  addition,  specific  behavioral 
examples  were  included  for  three  of  the  four  rating  forms  to  provide  detailed  information  to  assist  the 
raters  in  making  accurate  judgments. 

Task  Rating  Form.  This  form  consisted  of  a  comprehensive  listing  of  tasks  representative  of 
the  job  content  domain.  Task  identification  was  based  on  an  extensive  stratified  random  sampling  plan 
(Lipscomb  &  Dickinson,  1988)  that  used  information  obtained  from  the  Air  Force's  Occupational 
Survey  Program  (Christal,  1974).  The  relative  amount  of  time  spent  performing  these  tasks,  learning 
difficulty,  and  emphasis  given  to  the  tasks  in  training  were  used  to  select  a  representative  set  of  tasks. 
The  number  of  tasks  included  on  a  Task  Rating  Form  varied  between  25  and  40  across  the  seven  Air 
Force  specialties.  Ratings  were  made  on  a  5-point  graphic  rating  scale,  with  numerical  and  adjectival 
anchors  at  each  of  these  five  points.  The  scale  ranged  from  "1"  -  never  meets  acceptable  level  of 
proficiency  to  "5"  —  always  exceeds  acceptable  level  of  proficiency. 

Dimensional  Rating  Form.  This  rating  form  consisted  of  4  to  10  technical  dimensions 
designed  to  encompass  the  domain  of  job  performance  within  each  specialty.  Potential  dimensions 
were  identified  through  factor  analysis  of  co-performance  ratings  for  tasks  that  are  performed  by 
first-term  enlisted  personnel.  Subject-matter  experts  (SMEs)  used  this  information  in  preliminary 
workshops  to  identify  and  define  technical  dimensions,  and  to  generate  and  categorize  specific 
behavioral  examples  for  each  dimension.  In  a  series  of  follow-up  workshops,  the  set  of  dimensions  was 
reviewed,  revised,  and  confirmed,  and  the  specific  behavioral  examples  were  developed  and  revised. 
These  examples  were  then  assigned  to  dimensions  and  scale  values  through  a  standard  retranslation 
process.  The  behavioral  examples  were  developed  using  a  variant  of  the  Behavior  Summary  Scale 
(BSS)  approach  (Borman,  1979),  where  valid  SME-generated  behavioral  anchors  at  each  level  were 
combined  to  form  paragraph  descriptors  of  that  proficiency  level.  For  example,  these  paragraphs 
described  technical  effectiveness,  technical  efficiency,  and  amount  of  supervision  relevant  to  each 
proficiency  level. 

Air  Force-wide  Rating  Form.  This  rating  form  consisted  of  eight  performance  dimensions 
descriptive  of  success  across  all  Air  Force  specialties.  Because  of  this  cross-specialty  focus,  workshop 
participants  were  resource  managers  who  have  oversight  responsibility  for,  and  knowledge  o£  many 
different  specialties.  Their  combined  knowledge  provided  the  details  for  constructing  a  5-point  BSS 
rating  form  applicable  across  all  specialties.  This  form  contained  a  broad  range  of  dimensions, 
including  technical  ability,  initiative/effort,  adherence  to  regulations,  leadership,  military  appearance, 
self-development,  and  self-control.  In  addition,  behavioral  examples,  specific  to  each  dimension, 
anchored  each  of  the  five  scale  values. 
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Global  Rating  Form.  This  2-item  rating  form  consisted  of  an  overall  technical  and  an 
overall  interpersonal  rating.  Once  again,  a  series  of  workshops  with  SMEs  from  each  specialty  was 
used  to  generate  5-point  BSS  rating  scales.  Just  as  with  the  Dimensional  Rating  Form,  the  behavioral 
examples  for  the  technical  item  depicted  technical  effectiveness,  technical  efficiency,  and  amount  of 
supervision  relevant  to  each  specialty.  The  behavioral  examples  for  the  interpersonal  item  described 
initiative,  effort,  and  teamwork  relevant  to  each  specialty. 

Procedures 

Prior  to  the  completion  of  all  rating  forms  and  questionnaires,  raters  were  introduced  to  the 
purpose  of  data  collection,  participation  requirements  were  explained,  and  they  were  familiarized  with 
each  measure  used  in  the  project.  This  orientation  session  was  followed  by  approximately  1  hour  of 
frame-of-reference  and  rater  error  training  (for  a  detailed  description  see  Bierstedt  &  Hedge,  1987). 
Immediately  following  this  group  session,  rating  booklets  were  distributed,  and  raters  were  asked  to 
complete  all  measures.  The  rating  booklets  were  organized  such  that  each  rater  completed  the 
Background  Questionnaire  followed  by  the  Global,  Dimensional,  Task,  and  Air  Force-wide  rating 
forms,  and  then  the  Rating  Form  Questionnaire.  Supervisors  were  asked  to  rate  up  to  three  job 
incumbents  under  their  supervision.  Job  incumbents  were  asked  to  rate  themselves  and/or  up  to  three 
of  their  co-workers.  Thus,  ajob  incumbent  could  be  a  self  rater,  a  peer  rater,  or  both.  Regardless  of 
the  number  of  ratings  completed  by  a  rater,  Background  and  Rating  Form  Questionnaire  data  were 
collected  only  once  per  rater. 


EL  RESULTS 


Factor  Analyses 

The  10  Background  Questionnaire  items,  the  6  acceptability  items  (for  each  of  four  rating 
forms),  7  motivation  to  rate  items,  and  7  appraisal  trust  items  from  the  Rating  Form  Questionnaire 
were  factor  analyzed  separately  to  clarify  and  refine  the  hypothesized  constructs.  Each  analysis  used 
the  principle  components  extraction  technique,  with  orthogonal  rotation  of  factors  having  eigenvalues 
of  1 .0  or  greater  to  a  varimax  solution.  These  factor  analyses  were  performed  separately  on  supervisor 
and  job  incumbent  data.  Because  acceptability  data  (six  items)  were  collected  on  each  rating  form, 
separate  factor  analyses  were  computed  for  each  form.3 


3  In  each  case,  all  six  items  loaded  quite  similarly  and  strongly  on  one  acceptability  construct  across 
the  seven  specialties.  Subsequently,  separate  factor  and  regression  analyses  were  computed  using 
acceptability  data  from  each  of  the  four  rating  forms.  That  is,  eight  factor  analyses  and  eight  regression 
analyses  (four  rating  forms  by  two  sources)  were  computed.  However,  because  the  acceptability  factor 
loadings  were  quite  similar,  only  the  results  using  Task  Rating  Form  data  are  presented  in  Tables  1, 2, 
and  3.  Results  using  the  other  rating  form  acceptability  composites  are  available  from  the  first  author. 
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Following  the  separate  questionnaire  analyses,  a  higher  order  factor  analysis  was  performed  to 
combine  the  four  sets  of  factors  into  one  general  set  of  appraisal-related  dimensions.  Because  factors 
such  as  acceptability,  trust  in  the  appraisal  process,  and  motivation  to  rate  were  hypothesized  to  be 
intercorrelated,  the  higher  order  analysis  employed  a  principle  components  model  with  the  direct 
oblimin  method  of  oblique  rotation.  Once  again,  data  from  supervisors  and  job  incumbents  were 
analyzed  separately. 

Tables  1  and  2  present  loadings  of  variables  on  factors  for  job  incumbents  and  supervisors, 
respectively.  Variables  are  ordered  and  grouped  by  size  of  loading  to  facilitate  interpretation. 

Loadings  under  .45  (20%  of  variance)  were  excluded.  Nine  interpretable  factors  were  identified  for  job 
incumbents  and  supervisors,  although  the  factors  were  not  identical  across  the  two  sources.  The 
interpretable  9-factor  solution  for  the  job  incumbent  data  set  included  the  following  factors:  a) 
motivation  to  rate  accurately,  b)  job  satisfaction,  c)  acceptability,  d)  situational  constraints,  e)  trust  in 
other  raters,  f)  supervisory  support,  g)  trust  in  the  appraisal  process,  h)  trust  in  researchers,  and  i) 
general  motivation  to  rate.  The  9-factor  solution  from  the  supervisor  data  set  produced  eight  factors  (a 
-  h  above)  in  common  with  the  job  incumbent  set.  However,  supervisors  did  not  distinguish  between 
general  motivation  to  rate  and  motivation  to  rate  accurately,  but  (unlike  job  incumbents)  they  did 
digtingmgh  between  job  satisfaction  and  "esprit  de  corp."  High  loadings  across  data  sets  and  factors 
suggest  relatively  well-defined  constructs. 

Table  1.  Principal  Components  Analysis  of  the  Job  Incumbent  Background  and  Rating  Form  - 
Questionnaire. _ ^ _______________ 


Factor  Label  and  Items 

Loading 

1.  Motivation  to  Rate  Accurately 

a:  satisfied  ratings  were  accurate 

.76 

b:  extra  effort  to  pay  attention 

.74 

c:  care  about  rating  accuracy 

.72 

d:  important  to  make  accurate  ratings 

.71 

e:  in  general,  accurate  ratings  important 

.68 

2.  Job  Satisfaction 

a:  job  is  interesting 

.90 

b:  satisfied  with  job 

.85 

c:  sense  of  accomplishment  from  job 

.83 

d:  job  important  to  AF  mission 

.67 

e:  able  to  use  skills/talents  in  job 

.65 
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Factor  Label  and  Items  Loading 

3.  Acceptability  of  Rating  Form 

a:  allow  true  picture  of  performers  .83 

b:  show  differences  between  performers  .81 

c:  acceptable  to  users  .81 

d:  evaluate  job  proficiency  fairly  .72 

e:  easy  to  use  and  understand  .67 

f:  instill  confidence  in  ratings  .45 

4.  Situational  Constraints 

a:  technical  manuals  are  available  .  86 

b:  tools  and  equipment  available  .80 

c:  technical  manuals  clear/understandable  .46 

5.  Trust  in  Other  Raters 

a:  others  tried  to  follow  rating  rules  -.75 

b:  others  cared  about  accurate  ratings  -.72 

c:  others  gave  higher  ratings  than  deserved  .52 

6.  Supervisor  Support 

a:  supervisor  gives  support  I  need  .96 

b:  supervisor  concerned  about  well-being  .95 

7.  Trust  in  Appraisal  Process 

a:  others  comfortable  giving  low  ratings  .73 

b:  supervisor  access  to  this  information  .64 

8.  General  Motivation  to  Rate 

a:  motivated  to  complete  rating  forms  .73 

b:  rating  process  interesting  .71 

9.  Trust  in  Researchers 

a:  ratings  used  for  research  purposes  .84 

b:  true  purpose  of  rating  explained _ .71 
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Table  2.  Principal  Components  Analysis  of  the  Supervisor  Background  and  Rating  Form  - 
Questionnaire. 


Factor  Label  and  Items 

Loading 

1 .  Motivation  to  Rate  Accurately 

a:  important  to  make  accurate  ratings 

.88 

b:  care  about  rating  accuracy 

.86 

c:  extra  effort  to  pay  attention 

.79 

d:  in  general,  accurate  ratings  important 

.68 

e:  satisfied  ratings  were  accurate 

.57 

2.  Job  Satisfaction 

a:  sense  of  accomplishment  from  job 

.82 

b:  able  to  use  skills/talents  in  job 

.77 

c:  satisfied  with  job 

.69 

3.  Acceptability  of  Rating  Form 

a:  show  differences  between  performers 

.86 

b:  allow  true  picture  of  performers 

.82 

c:  acceptable  to  users 

.81 

d:  evaluate  job  proficiency  fairly 

.75 

e:  easy  to  use  and  understand 

.73 

f:  instill  confidence  in  ratings 

.57 

4.  Supervisor  Support 

a:  supervisor  gives  support  I  need 

.92 

b:  supervisor  concerned  about  well-being 

.89 

5.  Trust  in  Other  Raters 

a:  others  tried  to  follow  rating  rules 

.87 

b:  others  cared  about  accurate  ratings 

.84 

6.  Trust  in  Researchers 

a:  ratings  used  for  research  purposes 

-.80 

b:  true  purpose  of  rating  explained 

-.68 
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Factor  Label  and  Items 

Loading 

7.  Situational  Constraints 

a:  technical  manuals  are  available 

.79 

b:  tools  and  equipment  available 

.64 

c:  technical  manuals  clear/understandable 

.50 

8.  Trust  in  Appraisal  Process 

a:  supervisor  access  to  this  information 

.79 

b:  others  comfortable  giving  low  ratings 

.64 

c:  others  gave  higher  ratings  than  deserved 

.50 

9.  Esprit  de  Corp 

a:  job  important  to  AF  mission 

-.68 

b:  sense  of  pride  being  in  AF 

-.64 

Regression  Analyses 

In  an  effort  to  identify  factors  predictive  of  acceptability,  multiple  regression  analyses  were 
conducted  separately  for  supervisors  and  job  incumbents.  Based  on  the  previously-derived  factor 
solutions,  an  overall  acceptability  dependent  measure  was  formed  by  unit  weighting  the  six  items 
loading  on  that  factor  for  each  of  the  four  rating  forms.  Recall  that  attitudes  about  acceptability  of  the 
four  rating  forms  were  gathered  separately  for  each  form.  Thus,  our  overall  acceptability  measure  was 
formed  by  combining  scores  on  the  six  acceptability  items  across  the  four  rating  forms,  yielding  a 
24-item  acceptability  composite.  Likewise,  independent  measures  were  formed  by  unit  weighting  the 
items  loading  on  each  factor  identified  in  the  principle  components  analysis,  yielding  composites  for 
eight  supervisor  and  eight  job  incumbent  factors. 

In  addition,  to  assess  the  contribution  of  Air  Force  specialty  to  variance  in  the  dependent 
measure,  specialties  were  dummy  coded  as  an  independent  variable,  and  forced  into  the  regression 
equation  first,  followed  by  the  remainder  of  the  independent  variables  entering  in  a  forward  inclusion 
manner.  The  results  of  the  multiple  regression  analysis  are  presented  in  Table  3. 
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Table  3.  User  Acceptability  Regression  Analysis  by  Incumbent  and  Supervisor. 


Cumulative 

Cumulative 

Factor 

Beta 

multiple  R 

R  squared 

Job  Incumbent 

Specialty 

.018 

.055 

.003 

Motivation  to  Rate  Accurately 

.200 

.429 

.184 

Trust  in  Researchers 

.221 

.502 

.252 

General  Motivation  to  Rate 

.193 

.534 

.286 

Trust  in  Other  Raters 

.111 

.548 

.300 

Trust  in  the  Appraisal  Process 

.106 

.557 

.311 

Situational  Constraints 

.088 

.564 

.318 

Supervisor 

Specialty 

.039 

.077 

.006 

Motivation  to  Rate  Accurately 

.236 

.387 

.150 

Trust  in  Researchers 

.173 

.438 

.192 

Trust  in  Other  Raters 

.160 

.467 

.218 

Trust  in  the  Appraisal  Process 

.137 

.487 

.237 

Esprit  de  Corp 

.112 

.503 

.253 

Situational  Constraints 

.091 

.510 

.260 

For  job  incumbents,  six  factors  (listed  by  order  of  entry  into  the  regression  equation)  were 
identified  as  predictors  of  acceptability.  These  included  motivation  to  rate  accurately,  trust  in 
researchers,  general  motivation  to  rate,  trust  in  other  raters,  trust  in  the  appraisal  process,  and 
situational  constraints.  These  six  measures  accounted  for  32%  of  the  variance  in  acceptability.  For 
supervisors,  six  factors  were  identified  as  predictors  of  acceptability:  motivation  to  rate  accurately, 
trust  in  researchers,  trust  in  other  raters,  bust  in  the  appraisal  process,  esprit  de  corp,  and  situational 
constraints,  which  accounted  for  26%  of  the  variance  in  the  dependent  measure.  In  general,  rater 
motivation,  rater  trust,  and  situational  constraints  on  work  performance  were  significantly  related  to 
acceptability  in  both  rater  groups.  Supervisor  support  and  job  satisfaction  variables  did  not  account 
for  appreciable  variance  in  acceptability,  although  supervisors  did  believe  that  feelings  of  esprit  de  corp 
could  influence  attitudes  about  appraisal  acceptability.  These  findings  of  modest  relationships  between 
appraisal  attitudes  and  job  satisfaction  or  supervisory  support  are  consistent  with  results  reported  by 
Giles  and  Mossholder  (1990).  Finally,  the  Air  Force  specialty  was  not  an  important  factor  in  the 
variance  of  the  dependent  measure  (accounting  for  only  .003%  and  .006%  of  job  incumbents'  and 
supervisors'  acceptability  respectively). 
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Analysis  of  Variance 

In  order  to  investigate  differences  in  acceptability  across  rating  sources  and  rating  forms  a 
Rating  Source  (2)  x  Rating  Form  (4)  analysis  of  variance  (ANOVA)  was  computed,  using  the  6-item 
acceptability  composite  as  the  dependent  measure.  Table  4  displays  the  results  of  this  analysis. 

Table  4.  Rating  Source  X  Rating  Form  Analysis  of  Variance. 


Factor 

DF 

MS 

F 

Between  subjects 

Rating  Source  (S) 

1 

492.96 

9.22* 

Subjects  witin  group 

2101 

53.46 

“ 

Within  subjects 

Rating  Form  (F) 

3 

286.26 

51.99* 

SXF 

3 

9.98 

1.81 

Subjects  within  groups 

6303 

5.51 

— 

*p  <  .01 


The  Rating  Source  and  Rating  Form  main  effects  were  found  to  be  significantly  different  than 
chance  (p  <  .01).  Scheflfe's  post  hoc  tests  for  differences  among  means  were  conducted  on  each 
significant  effect.  For  the  Source  effect,  supervisors  were  found  to  be  more  accepting  of  the 
measurement  system  than  job  incumbents.  The  rating  form  post  hoc  analysis  found  significant  mean 
differences  between  the  Task  Rating  Form  and  all  other  forms,  with  the  Task  Rating  Form  less 
acceptable  to  raters. 


IV.  DISCUSSION 

The  present  study  examined  the  concept  of  acceptability  of  performance  ratings,  and 
correlates  of  acceptability.  It  then  used  acceptability  as  a  criterion  to  assess  differences  in  rater 
perceptions  across  rating  sources  and  forms. 

Factor  analysis  identified  a  number  of  interpretable  factors:  rater  acceptability,  rater 
motivation,  job  satisfaction,  supervisor  support,  situational  constraints,  and  rater  trust.  Comparable 
factor  patterns  suggest  that  job  incumbents  and  supervisors  structure  perceptions  similarly.  High  factor 
loadings  across  these  factors  indicate  relatively  well-defined  constructs. 
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It  is  especially  interesting  to  note  the  high  factor  loadings  for  all  items  under  the  acceptability 
factor.  The  importance  of  attitudes  toward  appraisal  system  equity  and  acceptability  was  noted  by 
Lawler  (1967)  over  25  years  ago.  More  recently,  Landy  and  his  colleagues  (Landy,  Barnes,  & 
Murphy,  1978;  Landy,  Bames-Farrell,  &  Cleveland,  1980)  and  Dipboye  and  de  Pontbriand  (1981) 
operationalized  Lawler's  notion,  focusing  on  perceived  fairness  and  accuracy  of  the  system,  and  ratee 
attitudes  about  the  usefulness  of  the  system  and  the  process.  Our  findings  suggest  that  acceptability  is 
a  broader,  multi-faceted  construct  involving  perceptions  of  appraisal  fairness,  clarity  of  instruction, 
accuracy,  discriminability,  and  confidence. 

Regression  analyses  indicate  that  the  same  basic  information  influences  job  incumbent  and 
supervisor  acceptability  of  the  appraisal  process:  rater  motivation,  rater  trust,  and  situational 
constraints.  Rater  motivation  and  trust  are  variables  internal  to  the  appraisal  process.  As  Bemardin 
and  his  colleagues  (Bemardin  &  Cardy,  1982;  Bemardin,  Orban,  &  Carlyle,  1981)  have  noted, 
individual  rater  motivation  and  trust  in  the  appraisal  process  may  be  strongly  linked  to  perceived 
accuracy  and  fairness  in  appraisal.  Our  results  support  empirically  the  link  between  appraisal  process 
variables  and  acceptability.  This  suggests  that  organizations  should  foster  conditions  for  motivation 
and  trust  in  the  appraisal  process.  Rater  orientation  and  training  strategies  should  be  helpful  in  this 
regard. 


External  work  impediments  also  seem  to  effect  perceptions  of  system  acceptability. 

Evidently,  raters  believe  that  problems  with  tool  and  technical  manual  availability,  and  technical  manual 
clarity  interfere  with  not  only  ratee  proficiency  but  also  raters'  performance  judgments.  Previously, 
Peters  and  O'Connor  (1980)  suggested  that  situational  constraints  affect  job  performance.  Our  findings 
suggest  that  constraints  may  also  interfere  with  rater  ability  to  judge  job  proficiency  fairly,  accurately, 
and  confidently. 

The  ANOVA  and  post  hoc  test  results  identified  differences  in  levels  of  acceptability,  with  the 
Task  Rating  Form  significantly  less  acceptable  to  all  raters,  and  supervisors'  perceptions  of  the  appraisal 
system  more  favorable  than  incumbents'perceptions.  These  results  raise  questions  about  the  usefulness 
of  the  Task  Rating  Form,  and  warrant  fiirther  investigation  since  this  finding  is  contrary  to 
expectations.  It  seems  logical  to  assume  that  a  detailed  rating  form  would  allow  raters  to  assess 
performance  more  accurately  than  a  more  general  form,  and  therefore  it  would  be  more  acceptable. 
However,  raters  may  dislike  rating  individuals  on  twenty-five  to  forty  items,  and  more  is  not  better. 
Perhaps  length  of  time  required  to  rate,  rating  specificity,  or  both  could  be  the  primary^eason(s)  for 
lower  acceptability. 

Mean  differences  were  also  found  between  rating  sources,  with  supervisor  perceptions  of  the 
appraisal  system  more  favorable  than  incumbent  perceptions.  Why  was  the  appraisal  system  more 
acceptable  to  supervisors  than  it  was  to  job  incumbents?  Perhaps  supervisor  familiarity  with  the  rating 
process  might  affect  acceptability.  Because  of  the  nature  of  their  jobs,  supervisors  have  much  more 
experience  rating  performance  than  do  job  incumbents  and,  therefore,  might  be  more  likely  to 
understand  and  accept  the  process.  Another  possibility  could  be  incumbent  skeptidsm  toward  the 
rating  process.  Since  job  incumbents  have  spent  their  careers  being  the  "target"  of  ratings  perhaps  they 
are  more  skeptical  of  the  process,  and  their  perceptions  of  rating  acceptability  are  lower. 
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Research  on  the  measurement  of  job  performance  has  been  prominent  in  the 
industrial/organizational  psychology  literature  for  many  years.  Most  of  this  work  has  used  validity, 
reliability,  and  rating  error  measures.  As  various  authors  have  noted,  however  (e.  g.,  Bemardin  & 
Beatty,  1984;  Jacobs,  Kafiy,  &  Zedeck,  1980;  Kavanagh,  1982),  there  are  multiple  criteria  to  use  for 
judging  the  quality  of  measurement  instruments,  procedures,  and  systems.  A  relatively  uninvestigated 
criterion  is  acceptability. 

Jacobs  et  al.  (1980),  in  an  examination  of  the  behaviorally  anchored  rating  scale  (BARS) 
literature,  noted  their  own  disappointing  experiences  with  organizations  abandoning  recently-developed 
appraisal  systems.  They  suggested  that  many  organizations  revert  back  to  evaluation  systems  in  use 
prior  to  intervention  because  of  organization  policy,  and  the  excessive  personnel  time  and  energy 
requirements  associated  with  BARS. 

These  frustrations,  in  a  very  applied  way,  speak  to  the  issue  of  acceptability,  and  suggest  the 
importance  of  including  this  variable  as  a  criterion  when  evaluating  worth  of  an  appraisal  system.  After 
all,  if  a  psychometrically-sound  system  is  developed,  but  is  unacceptable  to  its  users,  it  may  never  be 
used,  or  it  might  be  used  improperly. 

This  research  has  attempted  to  clarify  the  concept  of  acceptability  and  identify  factors  related 
to  acceptability.  We  believe  that  a  rater  acceptability  criterion  can  contribute  valuable  information 
about  the  worth  of  a  particular  measurement  instrument  or  an  appraisal  system,  and  should  be  used  in 
conjunction  with  other,  more  frequently  used  appraisal  critera.  As  Banks  and  Murphy  (1985)  have 
noted,  raters  must  not  only  be  capable,  but  they  must  also  be  willing  to  provide  accurate  ratings. 
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